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ABSTRACT: The use of pointers and data-structures based on pointers results in circular memory references 
that are interpreted by a vital compiler analysis, namely pointer analysis. For a pair of memory references at a 
program point, a typical pointer analysis specifies if the points-to relation between them may exist, definitely 
does not exist, or definitely exists. The "may be" case, which describes the points-to relation for most of the 
pairs, can not be dealt with by most compiler optimizations. This is so to guarantee the soundness of these 
optimizations. However the "may be" case can be capitalized by the modern class of speculative optimizations 
if the probability that two memory references alias can be measured. Focusing on multithreading, a prevailing 
technique of programming, this paper presents a new flow-sensitive technique for probabilistic pointer analysis 
of multithreaded programs. The proposed technique has the form of a type system and calculates the probability 
of every points-to relation at each program point. The key to our approach is to calculate the points-to information 
via a post-type derivation. The use of type systems has the advantage of associating each analysis results with a 
justification (proof) for the correctness of the results. This justification has the form of a type derivation and is 
very much required in applications like certified code. 

KEYWORDS: Static analysis. Speculative optimizations. Probabilistic alias analysis. Distributed programs. 
Semantics of multithreaded programs. Type systems. 

INTRODUCTION 

Multithreading is enjoying a growing interest 
and becoming a prevailing technique of pro- 
gramming. The use of multiple threads has 
several advantages: (a) concealing the delay 
of commands like reading from a secondary 
storage (b) improving the action of programs, 
like web servers, that run on multiprocessors, 

(c) building complex systems for user interface, 

(d) simplifying the process of organizing huge 
systems of code. However the static analysis of 
multithreaded programs is intricate due to the 
possible interaction between multiple threads. 

Among effective tools of modern program- 
ming languages are pointers which empower 
coding intricate data structures. Not only does 



the uncertainty of pointer values at compile time 
complicate analysis of programs, but also retard 
program compilation by compelling the program 
optimization and analysis to be conservative. The 
pointer analysis ^"^ of programs is a challenging 
problem in which researchers have trade space 
and time costs for precision. However binary 
decision diagrams'' have been used to ease the 
difficulty of this trade off. 

At any program point and for every pair of 
memory references, a traditional pointer analysis 
figures out whether one of these references may 
point to, definitely points to, or definitely does 
point to the other reference. For most of pairs 
of the memory references the points-to relation is 
of type "may be". This is specially the case for 
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32 techniques that prefer speed over accuracy. Tra- 

33 ditional optimization techniques are not robust 

34 enough to treat the cases "may be" and "defi- 

35 nitely" differently. The idea behind speculative 

36 optimization is to subsidize the "maybe" case, 

37 specially if the probability of "maybe" cab be 

38 specifically quantified®''. 

39 Pointer analysis is among most important 

40 program analyses of multithreaded programs. 

41 Pointer analysis of multithreaded programs has 

42 many applications; (a) mechanical binding of file 

43 operations that are in abeyance, (b) optimiza- 

44 tions for memory systems like prefetching cMd 

45 relocating remote data calculations, (c) equipping 

46 compilers with necessary information for opti- 

47 mizations like common subexpression elimrna- 

48 tion and induction variable elimination, and (d) 

49 relaxing the process of developing complex tools 

50 for software engineering like program slicers and 

51 race detectors. 

52 This paper presents a new technique for 

53 pointer analysis of multithreaded programs. The 

54 proposed technique is probabilistic; it anticipates 

55 precisely for every program point the probability 

56 of every points-to relation. Building on a type 

57 system, the proposed approach is control-flow- 

58 sensitive. The key to the presented analysis is 

59 to calculate probabilities for points-to relations 

60 through the compositional use of inference rules 

61 of a type system. The proposed technique 

62 associates with every analysis a proof (type 

63 derivation) for the correctness of the analysis. 

64 Among techniques to approach static analysis 

65 of programs is the algorithmic style. However 

66 the proposed technique of this paper has the 

67 form of a type system. The algorithmic style 

68 does not reflect how the analysis results are 

69 obtained because it works on control-flow graphs 

70 of programs; not on phrase structures as in 

71 the case of type systems. Therefore the type- 

72 systems approach^- ^^"^^ is perfect for applications 

73 that require to handle a justifications (proof) 

74 for correctness of analsys results together with 

75 each individual analysis. An example of such 

76 applications is certified code. What contributes to 

77 suitability of type-systems tools to produce such 

78 proofs is the relative simplicity of its inference 

79 rules. This simplicity is a much appreciated prop- 

80 erty in applications that require justifications. In 

81 type-systems approach, the justifications take the 

82 form of type derivations. 



1. a ■- &c; 

2. if{...)thenb:=&:c 

3. else h := 

4. par\ 

5. {a := &c} 

6. |h := &£/} 

7. ); 

8. whik{. . .) 

9. !/(. . .) then e := &d 

10. else e := 5; 



Fig. 1 A motivating example. 



Motivation 

Figure 1 presents a motivating example of our 84 

work. This example uses three pointer variables 85 

{a, b, and e) that point at two variables (c and 86 

d). We suppose that (i) the condition of the if 87 

statement at line 2 is true with probability 0.6, (ii) 88 

the condition of the if statement at line 9 is true 89 

with probability 0.5, and (iii) the loop at line 8 90 

iterates at most 100 times. These statistical and 9i 

probabilistic information can be obtained using 92 

edge profiling ^^"^®. In absence of edge profiling, 93 

heuristics can be used. The work presented in 94 

this paper aims at introducing a probabilistic 95 

pointer analysis that produces results like that 96 

in Figure 2. The aim is also to associate each such 97 

pointer-analysis result with a justification for the 98 

correctness of the result. This justification takes 99 

the form of a type derivation in our proposed lOO 

technique which is based on a type system. ioi 

Contributions 102 

Contributions of this paper are the following: 103 

1. A new pointer analysis technique, that is 104 
probabilistic and flow-sensitive, for multi- 105 
threaded programs. 106 

2. A new probabilistic operational-semantics 107 
for multithreaded programs. 108 

Organization 109 

The remainder of the paper is organized in 110 

three sections as follows. The first of these 111 

sections presents a simple language equipped 112 

with parallel and pointer constructs. This section 113 

also presents a new probabilistic operational se- 114 

mantles for the constructs of the language that we 1 1 5 

study. The second of these sections introduces a 1 1 6 

type system to carry probabilistic pointer analysis 1 1 7 

of parallel programs. This involves introducing 118 
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Program point 


Pointer information 


first point 
between lines 1 & 2 
point between 3 & 4 

point between 7 & 8 

last point 


|f 1^ 1 f e Var} 

|fll-^((c',l)),fl-^0|X7tfj 

{a ^ {{c',l)},b ^ {{c',0.6l{d',0A)], 

t f-^ d \ t ^ {a,b]] 

{a ^ ((c', 0.5), (tf', 0.5)1, 

b ^ j(c', 0.6), (d', 0.4)}, t\-^ID\ti{a,b]] 

{a ^ {(c', 0.5), (d', 0.5)1, e {{d', x ^-^C2y)) 

b ^ {(c', 0.6), (d', 0.4)1, t\^ID\ti{a,b, c}} 



Fig. 2 Results of pointer analysis of program in Figure 1. 



suitable notions for pointer types, a subtypEBg 

120 relation, and a detailed proof for the soundness 

121 of the proposed type system w.r.t. the semantics 

122 presented in the paper. Related work is reviewed 

123 in the last section of the paper. 

PROBABILISTIC OPERATIONAL 
SEMANTICS 

1 24 This section presents the programming language 

125 we study and a probabilistic pointer analysis for 

126 its constructs. We build our language (Figure 3) 

127 on the while language, originally presented by 

128 Hoare in 1969, by equipping it with commands 

1 29 dealing with pointers and parallel computations. 

1 30 The parallel concepts dealt with in our language 

1 31 are fork-join, conditionally spawned threads, and 

1 32 parallel loops. These concepts are represented by 

133 commands par, par-if, and par-for), respectively. 

134 Sates of our proposed operational semantics are 

135 defined as follows: 

136 Definition! 1. Addrs = {x' \ x e Var] and 

137 Val = ZD Addrs. 

138 2. ]/ e r = Var Val. 

139 3. state e States = {{y,p) | y e F A p e [0, 1]1 U 

140 {abort}. 

141 Typically, a state is a function from the set of 

142 variables to the set of values (integers). In our 

143 work, we enrich the set of values with a set of 

144 symbolic addresses and enrich each state with 

145 a probabilistic value that is meant to measure 

146 the probability with which this state is reached. 

147 The abort state is there to capture any case of 

148 de-reference that is unsafe; i.e de-referencing a 

149 variable that contains no address. We assume 

150 that the set of program variables, Var, is finite. 

151 Except that arithmetic and Boolean opera- 

152 tions are not allowed on pointers, the semantics 



of withmetic and Boolean expressions are defined 

as usual (Figure 4). The inference rules of Figure 5 1 54 

define the transition relation ^ of our operational 1 55 

semantics. 156 

We notice that none of the assignment state- 157 

ments changes the probability component of a 158 

given pre-state to produce the corresponding 159 

post-state. The symbol p,y used in the inference 160 

rules of the if statement denotes a number in [0, 1] 1 6i 

and measures the probability that the condition of 1 62 

the statement is true. This probabilistic informa- 163 

tion can be obtained using edge profiling In 164 

absence of edge profiling, heuristics can be used. 165 

The par command is the main parallel con- 166 

cept. This concept is also known as cobegin- 167 

coend or fork-join. The execution of this com- 168 

mand amounts to starting concurrently executing 1 69 

the threads of the command at the beginning of 170 

the construct and then to wait for the completion 1 71 

of these executions at the end of the construct. 172 

Then the subsequent command can be executed. 173 

The inference rule {par-sem) approximates the 174 

execution methodology of the par command. The 1 75 

probability p' in the rule (par-sem) is multiplied by 1 76 

^ (not by i as the reader may expect) because the 1 77 

permutation 6 finds one of the n! ways in which 178 

the threads can be sorted and then executed. As 179 

an example, the reader may consider applying 180 

the rule par-sem when n = 3 and the threads are i8i 

Si : a := b + c,S2 : b := a X c, and S3 : c := a - b. 182 

The semantics of par-if and par-for commands are 1 83 

defined using that of the par command. 1 84 

PROBABILISTIC POINTER ANALYSIS 

The purpose of a typical pointer analysis is 185 

to assign to every program point a points-to I86 

function. The domain of this function is the set of 1 87 

all pairs of pointers and the codomarn is the set 188 

{definitely exists, definitely does not exist, may exist]. 189 

The codomain describes the points-to relation 190 
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n e Z, X e Var, and e {+, -, x} 
e e Aexprs ::= x\n\ei®e2 

b e Bexprs ..= true [false \ \ ei = e2 \ ei ^ e2 \bi Ab2 \ bi V b2 
S e Stmts ::= X := e\x := &i/ \ *x := e \ x := *y \ skip \ Si; S2\if b then St else Sf \ 

while b do St \ par{{Si], . . . , {S„ jj | par-if{{bi, Si), . . . , (b„, S„)} | par-for{S}. 



Fig. 3 The programming language. 



Mr = n l&x^y = x' Mr = r(^) Itmely^tme ^ahejy = false 

y(y) iiy{x) = y', leily^Mly if til)', M}' e Z, 

otherwise. 

f ! if til)' = ! or M)' = ! 



i otherwise.' ^i^^^l^^^ 



I jalse otherwise. 



I ! ifM)'^Zorfelr^Z, 
Itei ^ eiTi - \ jgj^, ^ jg^j^, otherwise. 



For o e |A, V[, \bx o ^2!}' 
Fig. 4 Semantics of arithmetic and Boolean expressions 



! if Pily = ! or Pzl)' = I 

I^'il)' p2l>' Otherwise. 



between pairs of memory references. For mostaof 

192 the pointer pairs, the points-to relation is "may 

193 exist". This is specially the case for techniques 

194 of pointer analysis that give priority for speed 

1 95 over efficiency. The common drawback for most 

196 existing program optimization techniques is that 

197 they can not treat the "maybe" and "definitely 

198 does not exist" cases differently. Speculative 

199 optimizations are meant to overcome this 

200 disadvantage via working on the result of 

201 analyses that can measure the probability that a 

202 points-to relation exist between two pointers. 

203 This section presents a new technique for 

204 probabilistic pointer analysis for multithreaded 

205 programs. The technique has the form of a type 

206 system and its goal is to accurately calculate the 

207 likelihood at each program point for every points- 

208 to relation. The advantages of the proposed 

209 technique include the simplicity of the inference 

210 rules of the type system and that no dependence 

211 profile information (information describing de- 

212 pendencies between threads) is required. De- 

213 pendence profile information, required by some 

214 multithreading techniques like^', is expensive to 

215 get. The proposed technique is flow-sensitive. 

216 The key to our technique is to calculate points-to 

21 7 probabilities via a post type derivation for a given 

21 8 program using the bottom points-to type as a pre 



type. 

The following definition presents some nota- 220 

tions that are used in the rest of the paper. 221 

Definition 2 1. Addrs = [x' \ x e Var] and 222 

Addrsp = Addrs x [0, 1]. 223 

2. Pre-PTS = Ipts \ pts : Var T^'^'^'^r s.t. 224 
Vy e Var. (y', pi), (y', P2) e pts{x) =^p^=p2}. 225 

3. For pts e Pre-PTS and x e Var, 226 

Ep(s X = T,(z',p)€pts{x) P- 227 

4. For every pts e Pre-PTS and x e Var, 228 
Aptsix) = jz' I 3p > 0. (z',p) e pts{x)}. 229 

5. For A e AddrSp, pts e Pre-PTS, and O^q ^1, 230 

(a) Axq = {{y',px q) \ {y',p) e A]. 231 

(b) pts X q is the function defined by 232 
(pts X q){x) = pts{x) X q. 233 

We note that the set of symbolic addresses 234 

Addrs is enriched with probabilities to form the 235 

set AddrSp. In line with real situations, the 236 

condition on the elements of Pre-PTS excludes 237 

maps that assign the same address for a variable 238 

with two different probabilities. The notation 239 

XiptsX denotes the probability that the variables 240 

X has an address with respect to pts. The 241 
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Mv = ! 



Mv * 



y(x) = 2' z := e : (y,p) state 



X := e : (y,p) abort x := e : (>',p) {y[x 1-^ WyLp) »x := e : (y,p) sfafe 

^ Arfrf/'s y(y) (. Addrs 



• Fork-join: 



*x:=e: (y,p) ^ abort ^ ■= ■ {}',V) ^ (yI^ ^ V'lv) x := *y : {y,p) abort 
y{y) = z' J := z : {y,p) (y' ,p) Si : (y,p) abort 



X := *y : (y,p) {y',p) ^^'P O'^P) (}''P) Si; Sj : 0',p) abort 

Si : (y,p) (/',p") S2 : (y",p") state My = ! 



Si;S2 : 0',p) sffffe 
I)' = true St : ()',p) flborf 



ly = tnie Sf : (y,p) -w> ()/',p') 



;/ b then St else Sf : {y,p) abort if b then St else Sf : {y,p) ^ ()'' ,Pif x p') 
bjy= false Sf : {y,p) abort lbjy= false S f : (y,p) (y' ,p') 



if b then St else Sf : (y,p) ^ abort if b then St else Sf : {y,p) ()'', (1 - ptf) x p') 
my = ! my = false 

while b do St : (y,p) abort while b do St : (y,p) iy^p) 
[b])' = true S : {y,p) abort 

while b do St : (y,p) abort 
Ipjy = true S : {y,p) (y" ,p") while b do St : (y" ,p") state 

while b do St ■ ()',p) state 
(3 : |l,...,n} ^ {l,...,n]). Se(i);Se(2); . . . ;Se(„) : {y,p) 0'',p') 



ps'-IISi) |S„|| :0',p)->(v',^ xp') 



{par-sem) 



(3 : jl,...,n} ^ {!,..., )3)). Se(i);Se(2);. . .;Se(„) : (y,p)-^ abort 
par{{Si],...,{Sn]] ■■ (y,p)'^abort 

• Conditionally spawned threads: 

par\\if fcj then S\ else skip], ■■■,{'/ then S„ else skip]] : (y,p) ~^ state 



• Parallel loops: 



par-if{(bi, Si), . . . , (b„, S„)l ; (y,p) ^ state 



3n. par{[S], . . . , |S|| : (y,p) state 
par-for{S} : (y,p) state 



Fig. 5 Inference rules of the semantics. 



notation Apts{x) denotes the set of addresses itmt 

243 have a non-zero probability to get into x. The 

244 multiplication operations of Definition 2.5 are 

245 necessary to join many points-to types (each with 

246 a different probability) into one type. 

247 A formalization for the concepts of the set of 

248 points-to types PTS, the subtyping relation and 

249 the relation |= c F X PTS are in the subsequent 

250 definition. 

251 Definitions 1. PTS = {pts e Pre-PTS \ 



242 Vx e Var. J^^^ x < 1). 

def 

2. pts < pts' <^=^ Vx. Apts{x) c Apts'ix) 

def 



3. pts = pts' 



Vx. Af,ts{x) = Apts'ix) ■ 

def 



4. {y,p) \= pts <^ (Vx. y(x) e 
^3q>0. {y{x),q)epts{x)). 



Addrs 



A way to calculate an upper boimd for a set 
of n points-to types is introduced in the following 



253 



254 



255 
256 



257 
258 
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definition. 



306 



260 Definition 4 Suppose pfSj, ... ,pts,, is a sequence 

261 of n points-to types and ^ qi,...,q„ ^ 1 is a 

262 sequence of n numbers whose sum is less than or 

263 equal to 1. Then V((ptSj, iji), . . . , (pte„, qn)) ■ Var — » 

264 2"^'''^"'' is the function defined by: 

265 V{{pts-^,ql), (pfs„, qn)){x) = 

266 

267 {(z',p) I {3i.z' e Apts.{x))A{p = L^^,,p,)epts,{x)qkXpk)]- 

268 We note that the order of the points-to 

269 lattice is the point-wise inclusion. However 

270 probabilities are implicitly taken into account in 

271 the definition of supremum which is based on 

272 Definition 4. Letting the probabilities of points-to 

273 relations be involved in the definition of the order 

274 relation complicates the formula of calculating 

275 the lattice supremum. Besides that this compli- 

276 cation is not desirable, introducing probabilities 

277 apparently does not improve the type system 

278 results. The definition for (y, p) |= pts makes sure 

279 that a variable that has an address under y is 

280 allowed (positive probability) to contain the same 

281 address under pts. As for Definition 4, we can 

282 interpret the elements of the sequence q\,. . .,q„ 

283 as weights for the elements of the sequence 

284 pts^, . . . ,pts„, respectively. Therefore the map 

285 y{{pts^,qi),...,{pts^^,qn)) joins pts-^, . . . ,pts„ into 

286 one type with respect to the weights. 

287 The following lemma proves that the upper 

288 bound of the previous definition is indeed a 

289 points-to type. 

290 Lemma 1 Themapy((pts-^^,qi), . . . ,{pts^^,q„))ofpre- 

291 vious definition is a points-to type. 

292 Proof: Suppose that V((pts^, qi),. .., ipts^, (?n))(x) = 

293 {(Zj, fi), {z'^, ti), . . . , (z^, t,n)]. To show the required 

294 we need to show that (a) ^ f, < 1 and 

295 (b) < < 1. Since (b) implies (a), it is 

296 enough to show (b). Suppose that VI < / < 

297 n,ptsi(x) = {{z[,pii),(z'^,p2i), . ■ . ,{^'nuVini)], where 

298 VI < 7 < m, pji = if Zj i Apfs^(x). Then 

299 according to Definition 4 the values t\,...,t,„ 

300 can be equivalently calculated by the matrix 

301 multiplication of Figure 6. Then 

302 E, ti = (E, q, X pii) + (Li q, x p2,) + . . . + 

303 (E, q, X Pin) 

304 = (qi X E,- p,i) + {q2 x E, p^) + ...+ 

305 (qn X Li pi„). 



^Afe note that V;, < E( p,y < 1 by definition of ptSy 

and V;, < (^y < 1 . Therefore this last summation 307 

is less than 1 . □ 308 

Lemma 2 Suppose that A = {pts^, . . .,ptsj c PTS 309 

and pts = V((pfSp i), . . ., (pfs,,, ^)). Then with 3io 

respect to definitions ofV, the subtyping, and equality 3i i 

relations introduced in Definitions 3.2, 3.3, and 4, 312 

respectively, the set PTS is a complete lattice where 313 

VA = pts. 314 

Proof: Clearly pts is an upper bound for A. More- 315 

over for every x,Apfs(x) = U,Apts.(^). Therefore pis 31 6 

is the least upper boimd of A. □ 317 

The inference rules of our proposed type sys- 31 8 

tem for probabilistic pointer analysis are shown 319 

in Figure 7. 320 

The judgment of an arithmetic expression 321 

has the form e : pts A. The intuition 322 

(Lemma 3) of this judgment is that any address 323 

that e evaluates to in a state of type pts is included 324 

in the set A as the second component of a pair 325 

whose first component is a non-zero probability. 326 

The judgment for a statement S has the form 327 

S : pts — > pts' and guarantees that if the execution 328 

of S in a state of type pts terminates then the 329 

reached state is of type pts' . This is proved in 330 

Theorem 1. 331 

Concerning the inference rules, some com- 332 

ments are in order. In the rule (:= *P'"''), since 333 

there are n possible ways to modify x, the post- 334 

type is calculated from the pre-type by assigning 335 

X its value according to the upper bound of the 336 

n ways. The upper bound is consider to enable 337 

the analysis to cover all possible executions of 338 

the statement. In the rule (* :='"'''''), there are n 339 

variables, [z\, . . .,z„}, that have a chance of getting 340 

modified. This produces n post-types in the pre 341 

conditions of the rule. Therefore the post-type is 342 

calculated from the pre-type by assigning each of 343 

the n variables its image under the upper bound 344 

of the n post-types. In the rule {if""''),p is the 345 

probability that the condition of the if statement 346 

is true. The rule [parP'"^^) has this form in order 347 

for the analysis result of any thread S, of the 348 

par statement to consider the fact that any other 349 

thread may have been executed before the thread 350 

in hand. As it is the case in the operational 351 

semantics, the rules for conditionally spawned 352 

threads {par-if'"^) and parallel loops (par-for'™^^) 353 

are built on the rule (pflrP"*). In the following we 354 

give an example for the application of the rule 355 

iparP''"''). Let: 356 
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P21 



Pu 

P22 



Pin 



Z,,, L Pml Pml 

Fig. 6 A matrix multiplication needed in the proof of Lemma 1. 









( ^1 ' 



















e : pts ^ A 



n:pts^$ x: pts ^ pts(x) d ® €2 : pts ^ Hi ^ g . ^fj, ^ pfg^^. ^ 
pts(y) = {(z[,pi), {z'n,pn)\ V/. X := z; : pts ptS; 



■ (:= 



X := *!/ : pfs -> ptslx i-> V((pfSj,pi), . . . , (pts„,p„))(x)] " ' ^'^'P '■ P^^ ~* P^^ 

pfs(.T) = |(2j,pi), . . . , (z,'„p„)) VzJ e Ap,s(x). z; := e : pts ^ ptS; 



*x := e:pts^ pfs[z/ 1-^ V((pts, 1 - p,), (ptS;,pi))(Zi) | zj e Ap,s(x)] 



(* —F™*) 



X := &y : pts pts[x i-» {(y', 1))] 



. (:= feP'-'*) ^1 • ^ P*''" ^2 : pts" pts' 



Si;S2 : pts —> pts' 



(scqP'"'') 



St : pts ptSf Sfipts^ pts^ 

if h then St else Sj : pts ^ V((pfS(,p), (pfs^, 1 - p)) 
Si : V{{pts,l/n),{ptSj,l/n) \ j + i\ pts^ 



ar*) 



- (parP"*) 

parWSi], {S„|l : pfs ^ V((pfSi, . . . , (pts„, 1/n)) 
par\\if hi then Si else skip], ...,{ifb„ then S„ else skip}} : pts — > pts' 



par-if{{bi, Si),..., (bn, S„)) : pts pts' 

n-tinws 



'in.par{{S],...,{S]} -.pts pts' 



n = 



(par-forP"^) while b do St : pts pti 



(par-ir") 



[wht;"") 



par-for{S} : pts — > pts' 

n>l VI < ; ^ n. Sf : pfs ^' pfs, 



(whtl'"') 



while b do St : pts —> V((pfs-[, . . . , (pfs,,, I/m)) 

pfSj ^ pfSj S : pfSj — > pfSj pfSj < pfsl 

(cs/"*) 

S : pfSj pts'2 

Fig. 7 The inference rules for the type system for probabilistic pointer analysis 



• Si : ifbi then x :- &i/ else x := 5, 369 33f V((pfSj, 1/2), (pfSj, 1/2)) = 

{x i-» Hy', 0.25), (z', 0.5)}, t\-^iD\xi=te Var]. 370 

358 • S2 ■■ x:- &z; 

359 • Spar-par{{Si]AS2]], Clearly, Si : V((pfs, 1/2), (ptS2, 1/2)) ^ pfSj and 371 

S2 : V((pfs, 1/2), (ptSj, 1/2)) -> piSj. These 372 

360 • pts = 1-^ I t e Var), ^.^^ judgments constitute the hypotheses 373 

361 . . ,/ , ^ .^. . ^ , . . _ , 

382 



p^Sl = {x^->{(y',O.4)),^^^0U7t^eVar}, foj. the rule (parP"*). Therefore using 374 

andptS2 = |xf->{(z',l)|,fh-»0U7tte Var}. the rule (parP™''), we can conclude that 375 

363 We suppose that the condition h in Si succeeds S,,,,,. : pts V((pfSp 1/2), (pfSj, 1/2)). The post 376 

364 with probability 0.4. Then we have the following: type of S,„,- clearly covers all semantics states that 377 

. V((pfs,l/2),(p^,l/2)) = canbereachedbyexecutingSp„..Nowwegivean 378 

366 {x^{{y',0.25)},t^d\xi^teVar}, example for the application of the rule (par-z/'" ). 379 

Let: 380 

367 . V((pts, 1/2), (pfS2, 1/2)) 

368 (x 1-^ {(z', 0.5)1, t d \ X t e Var], and • Si : x := 38i 
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• S2 ■ X := &Z, 428 

429 

383 • Spar-if ■ par-if{{bi, Si), {true, S2)], and 

384 • pts' = {x y-^ Ky', 0.25), (z', 0.5)1, f I 

385 x + t ^ Var] and pts = [t t-^ d \ t e Var]. 

386 We suppose that the condition bi 

387 succeeds with probability 0.4. By the 

388 previous example it should be clear that 

389 par{{ifbi then Si else skip], {iftruethen S2 else skip]] : 

390 pts — > pts'. This last judgment constitutes the 

391 h5q30thesis for the rule {par-if'°^'). Therefore 

392 usmg the rule {par-if^), we can conclude that 

393 Spar-if '■ pts — > pts'. The post type of Spar-if clearly 

394 covers all semantics states that can be reached by 

395 executing Spar-if- In rules {whf^°^) and {whf^"^),n 

396 represents an upper bound for the trip-count of 

397 the loop. The post-type of {zvhf^"^) is an upper 

398 bound for post-types resulting for all number of 

399 iterations bounded by n. 

400 The proof of the following lemma is straight- 

401 forward. 

402 Lemma 3 1. pts ^ pts' => (V(y,p). 

403 {y,p) \= pts =^ {y,p) \= pts') 

404 2. Suppose e : pts ^ A and {y,p) \= pts. Then 

405 [[ejy e Addrs implies {^ejy,q) e A, for some 

406 q > 0. 

407 Lemma 3.1 formalizes the soundness of 

408 points-to types. Lemma 3.2 shows that for a cer- 

409 tain state that is of a certain type, if the evaluation 

410 of an expression with respect to the state is an 

411 address, then this evaluation is surely (positive 

412 probability) approximated by the evaluation of 

413 the expression with respect to the type. 

414 The following theorem proves the soundness 

41 5 of the type system. The meant soundness implies 

416 that the type system respects the operational 

417 semantics with respect to the relation |= whose 

418 definition is based on probabilities. 

419 Theorem 1 (Soundness) Suppose that 

420 S :pts ^ pts', S : {y,p) ^ {y' ,p'), md (y,p) \= pts. 

421 Then {y',p') \= pts'. 

422 Proof: A structure induction on type derivation 

423 can be used to complete the proof of this theorem. 

424 Some cases are presented below. 

425 • The case of (:=P™''): in this case p' = p, pts' = 

426 pts[x h-> A], and y' = y[x i-» le]}']. Hence by 

427 Lemma 3.2, y \= ipts,p) implies y' \= {pts',p'). 



386 The case of (:= *P''""): in this case for some 
z e Var, y{y) = z' and x := z : (y,p) ^ {y',p)- 

For some i, z' = z'. since (y,p) N pfs. Hence 430 

by assumption x := z, : pts — > pts-. Therefore 431 

by soundness of (:='"'"''), {y',p) \= pts- < pts' = 432 

pfs[XK^ V{{ptS^,pi), . . . ,(ptS,j,Pn)){x)]. 433 

• The case of (* :=/"'*): in this case there 434 
exists z e Var such that y{x) = z' and 435 
z := e : iy,p) ^ {y',p)- For some i, z' = z'. 436 
since {y,p) \= pts. Hence by assumption 437 
Zi := e : pts ^ pts-. Therefore by soundness 438 

of (-P™b), (y',p) 1= pts- < pts' = pts[Zi 439 

V((pts, 1 - Pi), (pfs,,p,))(z,) I z'- e Aptsix)]. 440 

• The case of (pflrP™''): in this case there exist 441 
a permutation 6 : {l,...,n] {l,...,n] 442 
and n + 1 states (>'i,pi),. . .,(>'„+i,p„+i) 443 
such that (y,p) = (yi,pi), (y',p') = 444 
{yn+i, X p;^^j), and for every 445 

Se(,) : (yi,p,) ^ {y,+i,pi+i). Also 446 

(yi,pi) \= pts < V[{pts,l/n),{ptsj,l/n) \ ; i= 1). 447 

Therefore by the induction 448 

hypothesis {yiiPi) N P^Sj < 449 

V{(pfs, 1 In), {pts-, 1 In) I ; i= 1]. Again by the 450 

induction hypothesis we get {y3,p3) \= pts^. 451 

Therefore by a simple induction on n, 452 

we can show that {yn+irPn+\) N P^s,, < 453 

V((pfSj,l/n),...,(pfs„,l/n)) = pts'. This 454 

implies {y',p') = (y„+i, ^ x p;^^) |= pts' 455 

• The case of {par - for'"'"''): in 456 
this case there exists n such that 457 

n-timcs 

par[{S],...,{S]] :{y,p)'^{y',p'). By 458 

induction hypothesis we have 459 

n-times 

par{{S], [S]] : pts — > pts' . Therefore 460 

by the soundness of {par>""^), {y',p') \= pts'. 46i 

• The case of (zf/zZj'"'): in this case 462 
there exist m ^ n and m + 1 states, 463 
{y\,P\),---,{ym+\,Pm+\), such that (y,p) = 464 

(yi/Pl)/ (y'/p') = {ym+\,Pm+\), and VI < 465 

/ < )«. S : (y,-,p,) (y,+i,p,+i). By 466 

induction hypothesis we have {y',p') \= 467 

pfs,„ < V((ptSj,l/«),...,(pfs„,l/n)). There- 468 

fore (y',p') N pfs' as required. 469 

□ 470 

We note that probabilities are mentioned 471 

implicitly in Theorem 1. This is in the condition 472 

that (y, p) \= pts. Some of the implications of this 473 
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implicit consideration of probabilities are expdieit 
in Lemma 3.2. As an example for the theoBem, 

476 executing the statement Spa, , defined above, from 

477 the semantics state y = {t \ t e Var] may 

478 result in the state y' = {t 0, x z' \ x t e Var]. 

479 This happens if S2 is executed after Si. Clearly 

480 we have that y \= {t h-^ H) \ t e Var] and 

481 \={x\-^ \{y', 0.25), (z', 0.5)}, t^%\x + t^Var\. 

482 One source of attraction in the use of type 

483 systems for program analysis is the relative 

484 simplicity of the inference rules. This simplicity 

485 is very important when practical implementation 

486 is concerned. The simplicity of the rules naturally 

487 simplifies implementations of rules and hence 

488 the type system. In particular, from experience 

489 related to coding similar type systems, we believe 

490 that the implementation of the type system 

491 presented in this paper is straightforward and 

492 efficient in terms of space and time. 

RELATED WORK 

493 Analysis of multithreaded programs: 

494 Typically, analyses of multithreaded programs 

495 are classified into two main categories: (a) 

496 techniques that were originally designed for se- 

497 quential programs and later extended to analyze 

498 multithreaded programs and (b) techniques that 

499 were designed specifically for analyzing, opti- 

500 mizing, or correcting multithreaded programs. 

501 The first category includes flow-insensitive 

502 approaches providing an easy way to analysis 

503 multithreaded programs. This is done via con- 

504 sidering all possible combinations of statements 

505 used in a parallel structure. The drawback of 

506 this approach is that it is not practical enough 

507 due to huge number of combinations. How- 

508 ever flow-sensitive approaches of sequential pro- 

509 grams were also extended to cover multithreaded 

510 programs. Examples of these techniques are con- 

51 1 stant propagation^", code motion and reaching 

512 definitions^^. 

513 The category of techniques that were de- 

514 signed specifically for multithreaded programs 

515 include deadlock detection, data race detec- 

516 tion, and weak memory consistency. A round 

517 abeyance to gain resources usually results in a 

518 deadlock situation Synchronization analy- 

519 sis is a typical start to study deadlock detection 

520 for multithreaded programs. In absence of 

521 synchronization, if two parallel threads write to 

522 the same memory location, a situation of a data 

523 race^ results. Data race analyses aim at elim- 



iflating data race situations as they are mainly 
pJTSgrammer error. Models of weak memory 

consistency^ aims at improving performance of 526 

hardware. This improvement usually results in 527 

complicating parallel programs construction and 528 

analysis. 529 

Probabilistic pointer analysis and speculative 530 

optimizations: 531 

Although pointer analysis is a well-established 532 

program analysis and many techniques have 533 

been suggested, there is no single technique 534 

that is believed to be the best choice The 535 

trade-off between accuracy and time-costs hin- 536 

ders a universal pointer analysis and motivates 537 

application-directed techniques for pointer anal- 538 

ysis^^. A probabilistic pointer analysis that 539 

is flow-sensitive and context-insensitive is pre- 540 

sented in^^ for Java programs. While our work 541 

is based on type systems, the work in^^ is 542 

based on interprocedural control flow graphs 543 

(ICFG) whose edges are enriched with proba- 544 

bilities. While our work treats multhi threaded 545 

programs, the work in^'' treats only sequential 546 

programs. Context-sensitive and control-flow- 547 

sensitive pointer analyses ^'^^'^^'^^ are known to 548 

be accurate but not scalable. On the other hand 549 

the context-insensitive control-flow-insensitive 550 

techniques^' are scalable but excessively con- 551 

servative. A convenient mixture of accuracy 552 

and scalability is introduced by some tech- 553 

nique''''"''^^ to optimize the trade-off mentioned 554 

above. The probabilistic pointer analysis of 555 

a simple imperative language and the pointer 556 

analysis of multithreaded programs were studied 557 

in^'^ and^^, respectively. However none of these 558 

typical techniques for pointer analysis study the 559 

probabilistic pointer analysis of multithreaded 560 

programs. 561 

Speculative optimizations ^^"^^ are considered 562 

by many program analyses. A probabilistic tech- 563 

nique for memory disambiguation was proposed 564 

in''^. This technique measures the probability 565 

that two array references alias. Nevertheless 566 

this approach is not convenient to pointers. 567 

By lessening the safety of analysis, the work 568 

in'^* introduces a pointer analysis that considers 569 

speculation. Another unsafe analysis, which 570 

achieves scalability using transfer functions, is 571 

proposed in^^. The problem with these last 572 

two approaches is that they do not compute the 573 

probability information required by speculative 574 
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620 

Type systems in program analysis: 

577 There are general algorithms ^'^^'^^'^^"^^ for using 

578 type systems to present dataflow analyses, which 

579 are monotone and forward or backward. While 

580 a way^'*'^'' to reason about program pairs using 

581 relational Hoare logic exists, program optimiza- 

582 tions^*'^^ as types systems also exist. Type 

583 systems were also used to cast safety policies for 

584 resource usage, information flow, and carrying- 

585 code abstraction*'''*^. Proving the soundness of 

586 compiler optimizations for imperative languages, 

587 using type systems, gained much interest ^^"^* of 

588 many researchers. Other work studies translat- 

589 ing proofs of functional correctness using wp- 

590 calculus*^ and using a Hoare logic There are 

591 other optimizations*'' that boost program quality 

592 besides maintaining program semantics. 

593 Edge and path profiling: 

594 Edge (path) profiling research simply aims at 

595 profiling programs edges (paths). The profiling 

596 process can be done statically or dynamically. 

597 Profiling techniques can be classified into: 

598 • Sample-based techniques which profile 

599 representative parts of active edges and 

600 paths, 

601 • One-time profiling methods which profile 

602 only part of the execution of the program 

603 to cut down the overhead^'' **, 

604 • Instrumentation-based techniques*^ which 

605 are more convenient for programs with com- 

606 parably anticipated behavior, and 

607 • Hardware profiling which employs hard- 

608 ware to gather edge profiles using existing 

609 hardware for branch anticipation^^. 

610 Using a parallel data-flow diagram*^, many of 

611 these techniques are applicable to the language 

612 studied in this paper. In particular the technique 

613 presented in^^, a hybrid sampling and instrumen- 

614 tation approach, is a convenient choice giving its 

615 simplicity and powerful. 
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